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Abstract. A first step towards more reliable software is to execute each 
statement and each control-flow path in a method once. In this paper, 
we present a formal method to automatically compute test cases for 
this purpose based on the idea of a bounded infeasible code detection. 
The method first unwinds all loops in a program finitely often and then 
encodes all feasible executions of the loop-free programs in a logical for- 
mula. Helper variables are introduced such that a theorem prover can 
reconstruct the control-flow path of a feasible execution from a satis- 
fying valuation of this formula. Based on this formula, we present one 
algorithm that computes a feasible path cover and one algorithm that 
computes a feasible statement cover. We show that the algorithms are 
complete for loop-free programs and that they can be implemented ef- 
ficiently. We further provide a sound algorithm to compute procedure 
summaries which makes the method scalable to larger programs. 

1 Introduction 

Using static analysis to find feasible executions of a program that pass a par- 
ticular subset of program statements is an interesting problem. Even though in 
general not decidable, there is ongoing research effort to develop algorithms and 
tools that are able to solve this problems for a reasonable large number of cases. 
Such tools can be used, e.g., to automatically generate test cases that cover large 
portions of a programs source code and trigger rare behavior, or to identify pro- 
gram fragments for which no suitable test case can be found. The later case 
sometimes is referred to as infeasible code detection [3] . Code is considered to be 
infeasible if no terminating execution can be found for it. Infeasible code can be 
seen as a superset of unreachable code as there might be executions reaching a 
piece of infeasible code which, however, fail during their later execution. 

In particular, a counterexample for the infeasibility of a piece of code is a 
terminating execution that executes this code. That is, finding a set of test cases 
that cover all statements in a program is equivalent to proving the absence of 
infeasible code. Existing approaches to detect infeasible code do not yet exploit 
the fact that counterexamples for infeasibility might constitute feasible test cases. 

In this paper, we discuss a bounded approach towards infeasible code de- 
tection that generates test cases that cover all statements which have feasible 
executions within a given (bounded) number of loop unwindings. The interest- 
ing aspect of bounded infeasible code detection over existing (unbounded) ap- 
proaches is that counterexamples for infeasibility are likely to represent actual 



executions of the program, as compared to the unbounded case, where these 
counterexamples might be introduced by the necessary over-approximation of 
the feasible executions. 

The paper proposes two novel ideas: the concept of reachability verification 
condition, which is a formula representation of the program which, similar the 
weakest-liberal precondition or strongest postcondition, models all feasible exe- 
cutions of a program. But in contrast to existing concepts, a satisfying assign- 
ment to the reachability verification condition can directly be mapped to an 
execution of the program from source to sink. For example, a valuation of wlp 
can represent a feasible execution starting from any point in a program, but this 
does not yet imply that this point is actually reachable from the initial states of 
the program. Certainly there are ways to encode the desired property using wlp, 
or sp by adding helper variables to the program (see, e.g., [15,3]), however, we 
claim that the proposed reachability verification condition provides a better for- 
mal basis to show the absence of infeasible code, as it, e.g., can make better use of 
the theorem prover stack which results in a more efficient and scalable solution. 
We suggest two algorithms to compute feasible executions of a program based 
on the reachability verification condition. One uses so-called blocking clauses 
to prevent the theorem prover from exercising the same path twice, the other 
algorithm uses enabling clauses to urge the theorem prover to consider a solu- 
tion that passes program fragments that have not been accessed before. Both 
algorithms return a set of feasible executions in the bounded program. Further, 
both algorithms guarantee that any statement not executed by these test cases 
is infeasible within the given bounds. We do a preliminary evaluation of our 
algorithms against existing algorithms to detect infeasible code. 

Based on the reachability verification condition, as a second novelty, we pro- 
pose a technique to compute procedure summaries for bounded infeasible code 
detection. As the presented algorithms return a set of feasible executions, we 
can extract pairs of input and output values for each execution to construct 
procedure summaries. The summaries are a strict under-approximation of the 
possible executions of the summarized procedure. Therefore, the computed sum- 
maries are sound to show the presence of feasible executions, but unsound to 
show their absence. To overcome this gap, we suggest an on-demand computa- 
tion of summaries if no feasible execution can be found with the given summary. 
Within the scope of this paper, we do not evaluate the concept of summaries as 
more implementation effort is required until viable results can be presented. 

In Section 3 we explain how we address the problem of computing the 
weakest-liberal preconditions for general programs. In Section 4 we show how 
a feasible execution that visits certain blocks can efficiently be expressed as a 
formula and introduce the concept of reachability verification condition. In Sec- 
tion 5 we present two different algorithms to address the problem of generating 
test cases with optimal coverage. In Section 6 we show how procedure sum- 
maries can be computed with our test case generation algorithm. We present an 
experimental evaluation of our algorithms in Section 7. 
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2 Preliminaries 



For simplicity, we consider only simple unstructured programs written in the 
language given in Figure 1. 



Program ::= Procedure 
Procedure ::= proc ProcName( Varld*) [returns Varld] { Block + } 
Block ::= label : Stmt* [ goto labef'' ; ] 
Stmt ::= Varld := Expr, | assume Expr; 

| Varld := call ProcName(Expr*); 

Fig. 1. Simple (unstructured) Language 



Expressions are sorted first order logic terms of appropriate sort. The expres- 
sion after an assume statement have Boolean sort. A program is given by a set 
of Procedures each with a unique name. The special procedure named "main" 
is the entry point of a program. Every procedure contains at least one block of 
code. A block consists of a label, a (possibly empty) sequence of statements, and 
non-deterministic goto statement that lists transitions to successor blocks. The 
goto statement is omitted for the blocks that have no successors. A statement 
can either be an assignment of a term to a variable, an assumption, or a pro- 
cedure call. A call to a procedure is indicated by the call keyword followed by 
the name of the procedure to call, and the (possible empty) list of arguments. 
A procedure can return a value by writing into the variable mentioned in the 
returns declaration. If this declaration is omitted, the procedure cannot return 
a value. If the conditional of an assumption evaluates to false, the execution 
blocks. Figure 2 shows a small example of our simple language. 

We assume that every procedure contains a unique initial block Block® and a 
unique final block that has no successor. A procedure terminates if it reaches the 
end of the final block. A program terminates if the "main" procedure terminates. 
We further assume the directed graph which is given by the transitions between 
the blocks is reducible. 

The presented language is simple but yet expressive enough to encode high 
level programming languages such as C [7]. In this paper we do not address the 
problems that can arise during this translation and refer to related work instead. 

The weakest-liberal precondition [9,2] semantics of our language is defined in 
the standard way: 
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proc foo(x, y) returns z { 
10: 

goto 11, 12; 
11: assume y > 0; 

z := x + y; 

goto 13; 
12: assume y <= 0; 

z := x - y; 

goto 13; 

13: 

} 

proc main() { 

10: r := call foo(0, 1); 

} 

Fig. 2. Example of our Simple Language 

A sequence of statements st in our language has a feasible execution if and only 
if there exists an initial valuation V of the program variables, such that in the 
execution of st all assume statements are satisfied. 

Theorem 1. A sequence of statements st has a feasible execution if and only if 
there exists a valuation V of the program variables, such that V \/= wlp(st, false). 

Hence, the initial state of a feasible execution of st can be derived from a 
counterexample to the formula representation of the weak-liberal precondition 
wlp(st, false). 

A path in a program is a sequence of blocks tt = Block . . . Block n such that 
there is a transition from any Blocki to Blocki + i for < i < n. We extend the 
definition of feasible executions from statements to paths by concatenating the 
statements of each block. We say that a path n is a complete path if it starts 
in the initial block and ends in the final block. In the following, we always refer 
to complete paths unless explicitly stated differently. A path is feasible, if there 
exists a feasible execution for that path. 

Theorem 2. Given a path tt = Blocks . . . Block n in a program V where sti repre- 
sents the statements of Blocki. The path it is called feasible, if and only if there ex- 
ists a valuation V of the program variables, such that V ^ wlp(sto; . . . ; st n , false). 

Note that our simple language does not support assertions. For the weakest 
liberal precondition, assertions are treated in the same way as assumptions. That 
is, we might render a path infeasible because it's execution fails, but still this 
path might be executable. As our goal is to execute all possible control-flow 
paths, we encode assertions as conditional choice. This allows us later on to 
check if there exist test cases that violate an assertion. 
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3 Program Transformation 



As the weakest-liberal precondition cannot be computed for programs with loops 
in the general case, an abstraction is needed. Depending on the purpose of the 
analysis, different information about the possible executions of the program has 
to be preserved to retain soundness. E.g., when proving partial correctness [1,2] 
of a program, the set of all executions that fail has to be preserved (or might be 
over-approximated), while terminating or blocking executions might be omitted 
or added. 

For our purpose of identifying a set of executions containing all feasible state- 
ments, such an abstraction, which over- approximates the executions of a program 
is not suitable as we might report executions which do not exist in the original 
program. Instead we need a loop unwinding which does not add any (feasible) 
executions. 

Loop Unwinding. Our loop unwinding technique is sketched in Figure 3. As we 
assume (w.l.o.g) that the control-flow graph of our input program is reducible, 
we can identify one unique entry point for each loop, the loop header Bh, and a 
loop body B. The loop header contains only a transition to the loop body and 
the loop exit B e . We can now unwind the loop once by simply redirecting the 




Fig. 3. Finite loop unwinding 

target of the back-edge that goes from B to Bh, to B e (and thus transforming 
the loop into an if-then-else) . 

To unwind the loop fc-times, for each unwinding, we have to create a copy of 
B and Bh, and redirect the outgoing edge of the B introduced in the previous 
unwinding to the newly introduced Bh- That is, the loop is transformed to an 
if-then-else tree of depth k. This abstraction is limited to finding executions 
that reach statements within less than (k + 1) loop iterations, however, as the 
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abstraction never adds a feasible execution, we have the guarantee that this 
execution really exists. 

Lemma 1. Given a program V and a program V' which is generated from V by 
k-times loop unwinding. Any feasible execution of 'V is also a feasible execution 
ofV. 

Procedure Calls. Procedure calls are another problem when computing the weak- 
est (liberal) precondition. First, they can introduce looping control flow via re- 
cursion, and second, inlining each procedure call might dramatically increase the 
size of the program that has to be considered. For recursive procedure calls, we 
can apply the same loop unwinding used for normal loops. 

To inline a procedure, we split the block at the location of the procedure 
call in two blocks and add all blocks of the body of the called procedure in 
between (and rename variables and labels if necessary). Then, we add additional 
assignments to map the parameters of the called procedure to the arguments 
used in the procedure call and the variable carrying the return value of the 
procedure to those receiving it in the calling procedure. 

If inlining all procedure calls is not feasible due to the size of the program, the 
call has to be replaced by a summary of the procedure body instead. We propose 
a technique that retains the soundness from Lemma 1 later on in Section 6. 

Single Static Assignment. For the resulting loop-free program, we perform a 
single static assignment transformation [8] which introduces auxiliary variables 
to ensure that each program variable is assigned at most once on each execu- 
tion path [12]. For convenience we use the following notation: given a program 
variable v, the single static assignment transformation transforms an assignment 
v := v + 1 into :— Vi + 1, where Vi+\ and Vi are auxiliary variable (and the 
index represents the incarnation of v). In the resulting program, each variable 
is written at most once. Hence, we can replace all assignments by assumptions 
without altering the feasible executions of the program. In that sense, the trans- 
formed program is passive as it does not change the values of variables. As single 
static assignment is used frequently in verification, we refer to the related work 
for more details (e.g., [12,19,2]). 

4 Reachability Verification Condition 

This section explains how to find a formula VC (the reachability verification 
condition) such that every satisfying valuation V corresponds to a terminating 
execution of the program. Moreover, it is possible to determine from the val- 
uation, which blocks of the program were reached by this execution. For this 
purpose VC contains an auxiliary variable Ri for each block that is true if the 
block is visited by the execution. From such an execution we can derive a test 
case by looking at the initial valuation of the variables. 

A test case of a program can be found using the weakest (liberal) precondi- 
tion. If a state satisfies the weakest precondition wp(S, true) of a program S it 
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will produce a non-failing run. However, it may still block in an assume state- 
ment. Since we desire to find non-blocking test cases we follow [3] and use the 
weakest liberal precondition of false. A state satisfies wlp(S, false) if and only if 
it does not terminate. Hence we can use ->wlp(S, false) to find terminating runs 
of S. 

For a loop-free program, computing the weakest (liberal) precondition is 
straight forward and has been discussed in many previous articles (e.g., [2,19,11,14]). 
To avoid exponential explosion of the formula size, for each block 

Blocki ::= i : Si; goto Succi 

we introduce an auxiliary variable Bi that represents the formula ->wlp (Blocki , false), 
where Blocki is the program fragment starting at label i and continuing to the 
termination point of the program. These variables can be defined as 

WLP: f\ Bi = ^wlp(Si, A ^ B i) 

0<i<n ^ jeSucCi ' 

A B n = -\wlp(S n , false). 

Introducing the auxiliary variables avoids copying the wlp of the successor blocks. 
If we are interested in a terminating execution that starts in the initial location 
0, we can find a satisfying valuation for 

WLP A B a 



Lemma 2. There is a satisfying valuation V for the formula WLP with V(-Bj) = 
true if and only if there is a terminating execution for the program fragment 
starting at the block Blocki. 

Proof is given in [3] . 

Thus a satisfying valuation V of WLP A Bo corresponds to a terminating 
execution of the whole program. Moreover if V(Bi) is true, the same valuation 
also corresponds to a terminating execution starting at the block Blocki. How- 
ever, it does not mean that there is an execution that starts in the initial state, 
visits the block Blocki, and then terminates. This is because the formula does 
not encode that Blocki is reachable from the initial state. 

To overcome this problem one may use the strongest post-condition to com- 
pute the states for which Blocki is reachable. This roughly doubles the formula. 
In our case there is a more simple check for reachability. Again, we introduce an 
auxiliary variable Ri for every block label i that holds if the execution reaches 
Blocki from the initial state and terminates. Let Prti be the set of predecessors 
of Blocki, i.e., the set of all j such that the final goto instruction of Block j 
may jump to Blocki. Then we can fix the auxiliary variables Ri using WLP as 
follows 

VC : WLP A Ro = B A f\ ( Ri = Bi A \f Rjj. 

l<z<n jGPrei 
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That is, the reachability variable of the initial block is set to true if the run is 
terminating. The reachability variable of other blocks is set to true if the current 
valuation describes a normally terminating execution starting at this block and 
at least one predecessor has its reachability variable set to true. 

Theorem 3. There is a valuation V that satisfies VC with V(i?o) = true if and 
only if the corresponding initial state leads to a feasible complete path it for the 
procedure. Moreover, the value of the reachability variable V(i?i) is true if and 
only if there is a path 7r starting in this initial state that visits block Blocki. 

Proof. Let there be a feasible path 7r and let V be the corresponding valuation 
for the initial variables. If one sets the value of each of the auxiliary variable Bi 
and Ri according to its definition in VC, then VC is satisfied by V. Moreover, 
the Bi variables for every visited block must be true by Lemma 2. Then also 
V(Ro) must be true, i.e., the reachability variable for the initial state must be 
true. By induction one can see that V(Ri) must also be true for every visited 
block Blocki. 

For the other direction, let V be a satisfying valuation for VC with V(-Rq) = 
true. Then also V(Bq) — true holds. Hence, by Lemma 2 this valuation corre- 
sponds to a feasible path ir. Let V(Ri) = true for some block. If i = then this 
is the initial block which is visited by the feasible path ir. For i ^ there is some 
predecessor j 6 Pra with V(Rj) — true. By induction over the order of the 
blocks (note that the code is loop-free) one can assume that there is a feasible 
path starting in this initial state that visits Block j. Since Block j ends with a 
non-deterministic goto that can jump to Blocki, the latter block is reachable. 
Moreover since Ri is true, also Bi must be true and by Lemma 2 the valuation 
corresponds to a terminating run starting at Block Blocki. Thus, there is a run 
that starts at the initial state, reaches block Blocki, and terminates. 

Thus VC is the reachability verification condition that can be used to gener- 
ate test cases of the program that reach certain blocks. To cover all statements 
by test cases, one needs to find a set of valuations for VC, such that each Ri 
variable is true at least in one valuation. The following section will tackle this 
problem. 

5 Covering algorithms 

We can now identify feasible executions through a block simply by checking if the 
reachability variable associated with this block evaluates to true in a satisfying 
valuation of the reachability verification condition. Further, due to the single 
static assignment performed before generating the formula, we can identify the 
initial values for each variable that are needed to force the execution of this 
path. That is, a valuation V satisfying the VC can serve as a test case for a 
block associated with a reachability variable R, if V(R) = true. 

Definition 1 (Test Case). Given a reachability verification condition VC of 
a program. Let B be a block in this program, and R be the reachability variable 
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associated with this block. A test case for the block B is a valuation V of VC , 
such that V \= VC and V(R) is true. 

In the following we present two algorithms to compute test cases for loop-free 
programs. The first algorithm computes a set of test cases to cover all feasible 
control-flow path, the second one computes a more compact set that only covers 
all feasible statements. 

Path Coverage Algorithm. To efficiently generate a set of test cases that covers all 
feasible control-flow paths, we need an algorithm that checks which combinations 
of reachability variables in a reachability verification condition can be set to 
true. That is, after finding one satisfying valuation for a reachability verification 
condition, this algorithm has to modify the next query in a way that ensures, that 
the same valuation is not computed again. This procedure has to be repeated 
until no further satisfying valuation can be found. 



Algorithm 1: AlgPC 
Input: VC: A reachability verification condition, 

1Z = {Ro, . . . , R n }: The set of reachability variables 
Output: T: A set of test cases covering all feasible paths, 
l begin 
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VC 
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r <- 


{} 




4 


v <- 


checksat (■*/>) 
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while V / {} do 
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r <- 


TU{V} 
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<— false 
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foreach R in TZ do 
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if V(R) = true then 
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endif 
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endfch 


15 




ip 


ip A cj> 


16 




V <- 


checksat (ip) 


17 


endw 




18 


return T 


19 end 









Algorithm AlgPC, given in Algorithm 1, uses blocking clauses to guarantee 
that the every valuation is only returned once. The blocking clause is the negated 
conjunction of all assignments to reachability variables in a valuation V. The 
algorithm uses the oracle-function checksat (see line 4 and 16), which has to 
be provided by a theorem prover. The function takes a first-order logic formula 
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as input and returns a satisfying assignment for this formula in form of a set of 
pair of variable and value for each free variable in that formula. If the formula 
is not satisfiable, checksat returns the empty set. 

The algorithm uses a local copy ip of the reachability verification condition 
VC. As long as checksat is able to compute a satisfying valuation V for ip, 
the algorithm adds this valuation to the set of test cases T (line 6), and then 
builds a blocking clause consisting of the disjunction of the negated reachability 
variables which are assigned to true in V (line 8). The formula ip is conjuncted 
with this blocking clause (line 13), and the algorithm starts over by checking 
if there is a satisfying valuation for the new formula (line 14). The algorithm 
terminates when ip becomes unsatisfiable. 

Theorem 4 (Correctness of AlgPC). Given a loop-free and passive program 
P with verification condition VC . Let TZ be the set of reachability variables used 
in VC . Algorithm AlgPC, started with the arguments VC and TZ, terminates 
and returns a set T ■ For any feasible and complete path ir there is a test case in 
T for this path. 

Proof. There are only finitely many solutions for the variables TZ that will satisfy 
the formula VC. Due to the introduction of the blocking clause, every solution 
will be found only once. Hence, after finitely many iteration the formula ip must 
be unsatisfiable and the algorithm terminates. If tt is a feasible and complete 
path, then by Theorem 3 there is a valuation V with V(R) — true for every 
block visited by tt. Such a valuation must be found by the algorithm before a 
corresponding blocking clause is inserted into ip. The corresponding test case is 
then inserted into T and is a test case for tt. 

Note that AlgPC is complete for loop-free programs. For arbitrary programs 
that have been transformed using the steps from Section 3, the algorithm still 
produces only feasible test cases due to the soundness of the abstraction. 

The advantage of using blocking clauses is that AlgPC does not restrict 
the oracle checksat in how it should explore the feasible paths encoded in 
the reachability verification condition. The drawback of AlgPC is that, for each 
explored path, a blocking clause is added to the formula and thus, the increasing 
size of the formula might slow down the checksat queries if many paths are 
explored. This limits the scalability of our algorithm. In Section 7 we evaluate 
how the performance of AlgPC changes with an increasing size of the input 
program. 

Statement Coverage Algorithm. In some cases one might only be interested in 
covering all feasible statements. To avoid exercising all feasible paths, we present 
a second algorithm, AlgSC, in Algorithm 2 that computes a compact set of test 
cases to cover all feasible statements. The algorithm uses enabling clauses instead 
of blocking clauses that prevent the oracle from computing the same valuation 
twice. An enabling clause is the disjunction of all reachability variables that have 
not been assigned to true by previous satisfying valuation of the reachability 
verification condition. 
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Algorithm 2: AlgSC 



Input: VC: A reachability verification condition, 

TZ = {Ro, . . . , R n }'- The set of reachability variables 
Output: T: A set of test cases covering all feasible statements, 
begin 

V «- checksat(yC) 
while V {} do 

r^ru{v} 

foreach R in TZ do 
if V(R) = true then 
TZ^TZ\{R} 
TZ RemoveClones(i?, 72-) 
endif 
endfch 
4> <— /atee 

foreach R inlZ do 

I <j) <r- </) V R 

endfch 

V «- checksat( VC A <j>) 
endw 
return T 



19 end 



The algorithm takes as input a reachability verification condition VC, and 
the set of all reachability variables TZ used in this formula. Like AlgPC, AlgSC 
uses the oracle function checksat. First, it checks if there exists any satisfying 
valuation V for VC. If so, the algorithm adds V to the set of test cases (line 5). 
Then, the algorithm removes all reachability variables from the set TZ, which are 
assigned to true in VC (line 8). While removing the reachability variables which 
are assigned to true, the algorithm also has to check if this reachability variable 
corresponds to a block created during loop unwinding. In that case, all clones of 
this block are removed from TZ as well using the helper function RemoveClonesQ 
(line 9). After that, the algorithm computes a new enabling clause </> that equals 
to the disjunction of the remaining reachability variables in TZ (line 13) and 
starts over by checking if VC in conjunction with <fi is satisfiable (line 16). That 
is, conjunction VC A (f> restricts the feasible executions in VC to those where 
at least one reachability variable in TZ is set to true. Note that, if the set TZ is 
empty, the enabling clause <j) becomes false, and thus the conjunction with VC 
becomes unsatisfiable. That is, the algorithm terminates if all blocks have been 
visited once, or if there is no feasible execution passing the remaining blocks. 

Theorem 5 (Correctness of AlgSC). Given a loop-free and passive program 
P with reachability verification condition VC . Let TZ be the set of reachability 
variables used in VC . Algorithm AlgSC , started with the arguments VC and 
TZ, terminates and returns a set T ■ For any block in the program there exists a 
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feasible paths ir passing this block if and only if there exists a test case V £ T , 
that passes this block. 

Proof. In every iteration of the loop at least one variable of the set 1Z will be 
removed. This is because the formula <j> will only allow valuations such that for 
at least one R G 1Z the valuation V(R) is true. Since 1Z contains only finitely 
many variables the algorithm must terminate. If n is a feasible path visiting the 
block associated with the variable R, then there is a valuation V that satisfies 
VC with V(R) = true. Such a valuation must eventually be found, since VC A<j) 
is only unsatisfiable if R ^ 1Z. The valuation is added to the set of test cases T ■ 

The benefit of AlgSC compared to AlgPC is that it will produce at most 
\1Z\ test cases, as each iteration of the loop will generate only one test case and 
remove at least one element from 1Z. That is, the resulting set T can be used 
more efficiently if only statement coverage is needed. However, the enabling 
clause might cause the theorem prover which realizes checks at to take detours 
or throw away information which could be reused. It is not obvious which of 
both algorithms will perform better in terms of computation time. Therefore, 
in the following, we carry out some experiments to evaluate the performance of 
both algorithms. 

Note that, like AlgPC, AlgSC is complete for loop-free programs and sound 
for arbitrary programs. That is, any block that is not covered by these algorithms 
is unreachable code (in the loop- free program). 

6 Procedure Summaries 

For large programs, inlining all procedure calls as proposed in Section 3 might 
not be feasible. However, replacing them by using assume-guarantee reasoning 
as it is done, e.g., in static checking [1] is not a feasible solution either. Using 
contracts requires the necessary expertise from the programmer to write proper 
pre- and postconditions, and thus, it would violate our goal of having a fully 
automatic tool. If trivial contracts are generated automatically (e.g., [15]), it 
will introduce feasible executions that do not exist in the original program. This 
would break the soundness requirement from Lemma 1 that each of the test 
cases returned by the algorithms AlgPC and AlgSC must represent a feasible 
path in the (loop-free) program. 

Instead of inlining each procedure call, we propose to replace them by a 
summary of the original procedure which represents some feasible executions 
of the procedure. The summary can be obtained directly by applying AlgPC 
or AlgSC to the body of the called procedure. Each valuation V in the set T 
returned by these algorithms contains values for all incarnations of the variables 
used in the procedure body on one feasible execution. In particular, for a variable 
v, with the first incarnation vq and the last incarnation v n , V(vo) represents one 
feasible input value for the considered procedure and V(v n ) represents the value 
of v after this procedure returns. That is, given a procedure P with verification 
condition VC and reachability variables 7Z, let T = AlgPC(VC,7Z) or T = 
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AlgSC( VC, 1Z) respectively. Furthermore let V be the set of variables which 
are visible to the outside of P, that is, parameters and global variables. The 
summary Sum of P is expressed by the formula: 

Sum := \/ ( f\ (v = V(v )) A f\ (v n = V(v n ))), 

VET vEV vEV 

where n refers to the maximum incarnation of a particular variable v. The sum- 
mary can be interpreted as encoding each feasible path of P by the condition 
that, if the initial values for each variable are set appropriately, the post-state 
of this execution is established. We need an underapproximation of the feasible 
executions of the procedure as the procedure summary. Therefore we encode the 
summary of the previously computed paths and let the theorem prover choose 
the right path. In practice, in particular when using AlgPC, it can be useful to 
consider only a subset of T for the summary construction, as a formula represent- 
ing all paths might outgrow the actual verification condition of the procedure. 

On the caller side, we can now replace the call to a procedure P by an 
assumption assume Sum where Sum is the procedure summary of P. We further 
have to add some framing assignments to map the input- and output variables 
of the called procedure to the one of the calling procedure. We illustrate this 
step using the following example program: 



proc 


foo(a, b) 


returns c { 


11: 








goto 12, 


13; 


12: 


: assume b 


> 0; 




c := a + 


i; 




goto 14; 




13: 


: assume b 


<= 0; 




c := a - 


i; 




goto 14; 





14: 

} 



proc bar(x) returns z { 
11: 

z := call f oo(x, 1) ; 

} 

Applying the algorithm AlgPC to the procedure f oo will result in a summary 
like: ' 

._ (ao = A b = 0) A (ci = a - 1) 
OUm - V(a = A b = 1) A (ci = a + 1) 

This summary can be used to replace the call statement in bar after the single 
static assignment has been performed as follows: 

proc bar(x) returns z { 



13 



11: 

assume aO=x; 
assume bO=l; 
assume Sum; 
assume zl=cl; 

} 

Note that, to avoid recomputing the single static assignment, when reaching a 
call statement, we increment the incarnation count for each variable that might 
be modified by this procedure and the incarnation count of each global variable. 
Therefore, we have to add frame conditions if a global variable is not changed 
by the summary (in this example it is not necessary, as there are no global 
variables) . 

A procedure summary can be seen as a switch case over possible input values. 
That is, the summary provides the return values for a particular set of input 
values to the called procedure. Any execution that calls the procedure with 
other input values becomes infeasible. In that sense, using procedure summaries 
is an under-approximation of the set of feasible executions and thus sound for 
our purpose. 

Lemma 3 (Soundness). Given a loop-free procedure P which calls another 
loop-free procedure P' . Let P# be the version of procedure P where all calls to 
P' have been replaced by the summary of P' . Any feasible execution of P# is 
also a feasible execution of P. 

Using these summaries is a very strong abstraction as only a very limited 
number of possible input values is considered as the set of feasible executions of 
the called procedure is reduced to one per control-flow path (or even less, if algo- 
rithm AlgSC is used). In particular, this causes problems if a procedure is called 
with constant values as arguments. In the example above, inlining only works if 
the theorem prover picks the same constant when computing the summary that 
is used on the caller side (which is the case here). If the constants do not match, 
the summary might provide no feasible path through the procedure, which is 
still sound but not useful. In that case, a new summary has to be computed 
where the constant values from the caller side are used as a precondition for the 
procedure (e.g., by adding an appropriate assume statement to the first block of 
the called procedure) before re-applying algorithm AlgPC or AlgSC. 

The benefit of this summary computation is that it is fully automatic and the 
computation of the summary is relatively cheap, because the called procedure 
has to be analyzed at least once anyway. However, it is not a silver bullet and 
its practical value has to be evaluated in our future work. We do not consider 
procedure summaries as an efficient optimization. They rather are a necessary 
abstraction to keep our method scalable. 

7 Experiments 

We have implemented a prototype of the presented algorithms. As this prototype 
still is in a very early stage of development, the goal of this experiments is only 
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to evaluate the computation time of the queries needed to cover all feasible 
statements in comparison to similar approaches. Other experiments, such as the 
applicability to real world software remain part of future work. 

We compare the algorithms from Section 5 with two other approaches that 
compute a covering set of feasible executions: A worst-case optimal approach 
AlgFM from [16] and a query-optimal approach AlgVSTTE from [3]. The 
worst-case optimal approach checks if there exists a feasible control-flow path 
passing each minimal block. A block is minimal, if there exists no block that is 
passed by a strict subset of the executions passing this block [15,4]. Each imple- 
mentation uses helper variables to build queries that ask the theorem prover for 
the existence of a path passing through one particular block. The query-optimal 
approach [3] uses helper variables to count how many minimal elements occur 
on one feasible execution and then applies a greedy strategy to cover as many 
minimal elements as possible with one valuation of the formula. Note that the 
purpose of AlgFM and AlgVSTTE is slightly different from the purpose of the 
algorithms in this paper. The AlgFM and AlgVSTTE use a loop-free abstrac- 
tion of the input program that over-approximates the set of feasible executions 
of the original program (see, [15]). On this abstraction they prove the existence 
of blocks which cannot be part of any terminating execution. To be comparable, 
we use the same abstraction for all algorithms. That is, we use AlgFM and 
AlgVSTTE to check the existence of statements that do not occur on feasi- 
ble executions. Since both algorithms are complete for loop-free programs, they 
return the same result as AlgSC. 

Note that the result and purpose of all algorithms is slightly different. How- 
ever, all of them use a theorem prover as an oracle to identify executions that 
cover all feasible statements in a program. 

For now, our implementation works only for the simple language from Sec- 
tion 2. An implementation for a high-level language is not yet available. Hence 
the purpose of the experiments is only to measure the efficiency of the queries. 
Therefore, we decide to use randomly generated programs as input data. Gen- 
erated programs have several benefits. We can control the size and shape of the 
program, we can generate an arbitrary number of different programs that share 
some property (e.g., number of control- flow diamonds), and they often have lots 
of infeasible control-flow paths. We are aware that randomly generated input is 
a controversial issue when evaluating research results, but we believe that, as 
we want to evaluate the performance of the algorithms, and not their detection 
rate or practical use, they are a good choice. A more technical discussion on this 
issue follows in the threats to validity. 

Experimental Setup. As experimental data, we use 80 randomly generate un- 
structured programs. Each program has between 2 and 9 control-flow diamond 
shapes, and each diamond shape has 2 levels of nested if-then-else blocks (i.e., 
there are 4 distinct paths through each diamond). A block has 3 statements, 
which are either assignments of (linear arithmetic) expressions to unbounded 
integer variables or assumptions guarding the conditional choice. Each program 



15 



Algorithm 


Queries (total) 


Time (sec) 


AlgPC 


69777 


13.12 


AlgSC 


854 


11.81 


AlgFM 


1760 


145,45 


AlgVSTTE 


512 


615,51 



Table 1. Comparison of the four algorithms in terms of total number of queries 
and computation time for 80 benchmark programs. 

has between 90 and 350 lines of code and modifies between 10 and 20 different 
variables. 

For each number of control-flow diamonds, we generated 10 different ran- 
dom programs and computed the average run-time of the algorithms. This is 
necessary to get an estimate of the performance of each algorithm, as their com- 
putation time strongly depends on the overall number of feasible executions in 
the analyzed program. 

For a fair comparison, we use the theorem prover SMTInterpol 3 in all four 
algorithms. For each algorithm, we record how often the theorem prover is asked 
to check the satisfiability of a formula and we record the time it takes until 
the theorem prover returns with a result. All experiments are carried out on a 
standard desktop computer with ample amount of memory. 

Discussion. Table 1 shows the summary of the results for all algorithms after 
analyzing 80 benchmark programs. Figure 5 gives a more detailed view on the 
computation time per program. The x-axis scales over the number of control-flow 
diamonds ranging from 2 diamonds to 9. Figure 6 gives a detailed view on the 
number of queries. As before, the x-axis scales over the number of control-flow 
diamonds. 

The algorithms AlgPC and AlgSC are clearly faster than AlgFM and 
AlgVSTTE. Overall, AlgSC tends to be the fastest one. Figure 4 shows the 
computation time for AlgPC and AlgSC in a higher resolution. It turns out 
that the difference between the computation time of AlgPC and AlgSC tends 
to become bigger for larger programs. As expected, AlgSC works a bit more 
efficient as the size of the formula is always bounded, while AlgPC asserts one 
new term for every counterexample found. However, comparing the number of 
theorem prover calls, there is a huge difference between AlgPC and AlgSC. 
While AlgSC never exceeds a total of 20 queries per program, AlgPC skyrock- 
ets already for small programs. For a program with 10 control-flow diamonds, 
AlgPC uses more than 2000 theorem prover calls, where AlgSC only need 10. 
Still, AlgSC is only 0.03 seconds faster on this example (< 10%). 

These results show that, even though AlgSC might be slightly more efficient 
than AlgPC, the number of queries is not an important factor for the computa- 
tion time. In fact, internally, the theorem prover tries to find a new counterex- 
ample by changing as few variables as possible which is very close to the idea of 

3 http: //ultimate . informatik.uni-freiburg.de/smtinterpol/ 
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Fig. 4. Runtime comparison of the algorithms proposed in Section 5. The ticks 
on the x-axis represent the number of control-flow diamonds in the randomly 
generated programs. 

AlgPC '. AlgSC, which queries if there exists a counterexample through a block 
that has not been visited so far, will internally perform the same steps as AlgPC 
and thus, the performance gain is only rooted in the smaller formulas and re- 
duced communication between application and prover. However, the results also 
show that, when using a theorem prover, computing a path cover with AlgPC 
is not significantly more expensive than computing only a statement cover with 
AlgSC. 

The computation time for AlgFM and AlgVSTTE are significantly higher 
than the one for the presented algorithms. For AlgFM, some queries, and thus 
some computation time, could be saved by utilizing the counterexamples to 
avoid redundant queries. However, the number of queries cannot become better 
than the one of AlgSC due to the kind of queries. The most significant benefit of 
AlgPC and AlgSC over AlgFM is that they don't have to inject helper variables 
in the program. In fact AlgPC and AlgSC also use one variable per block to 
encode the reachability, but this variable is added to the formula and not to the 
program. Thus, it is not considered during single static assignment, which would 
create multiple copies for each variable. 

For the query-optimal algorithm AlgVSTTE, the computation time be- 
comes extremely large for our random programs. This is due to the fact that 
AlgVSTTE tries to find the best possible counterexample (that is, the one with 
the most previously uncovered blocks) with each query. Internally, the theorem 
prover will exercise several counterexamples and discard them until the best one 
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Fig. 5. Computation time for each algorithm. The ticks on the x-axis represent 
the number of control-flow diamonds in the randomly generated programs. 

is found. The procedure is similar to the one used in AlgPC and AlgSC: the 
theorem prover computes a counterexample and then assures that this example 
cannot be found again, and then starts over. But in contrast to AlgVSTTE, 
our algorithms do not force the theorem prover to find a path that satisfies ad- 
ditional constraints, and, hence, relaxing the problem that has to be solved by 
the theorem prover. Even though one might find benchmarks where AlgVSTTE 
is significantly faster than AlgFM, the algorithms AlgPC and AlgSC will al- 
ways be more efficient since they pose easier (and, hence, faster) queries to the 
theorem prover. 

The presented results should not be interpreted as an argument against a 
query-optimal algorithm. We rather conclude that the place for such optimiza- 
tions is inside the theorem prover. Modifying the way, the theorem prover finds 
a new counterexample can lead to tremendous performance improvements. How- 
ever, such changes have to consider the structure of verification conditions and 
thus will exceed the functionality of a general theorem prover. 

Threats to validity. We emphasize that the purpose of the experiments is only 
to evaluate the performance of AlgPC and AlgSC. These experiments are not 
valid to reason about practical use or scalability of the method. 

We report several internal threats to validity: The experiments only used a 
very restricted background theory. However, the path reasoning described in this 
paper prunes the search space for the theorem prover even if we use richer logics 
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Fig. 6. Number of call to the theorem prover for each algorithm. The ticks on the 
x-axis represent the number of control-flow diamonds in the randomly generated 
programs. 

including arrays or quantifiers. As shown in our experiments, the algorithms 
proposed in this paper pose easier problems to a theorem prover. This won't 
change if we switch to richer logics since our algorithms only limit the theorem 
prover to reason about feasible paths while all other algorithms pose additional 
constraints on such a path. If we use richer logics we only limit the number of 
paths. But still it remains easier to just find a path than to find one that satisfies 
some additional condition. 

We have chosen randomly generated programs as input for two reasons. First, 
we wanted to be able to scale the number of paths and use the most difficult 
shape of the control structure for our techniques. Hence, we had to scale the 
number of diamonds in the control flow graph. Second, we did not implement 
a parser for a specific language. Existing translations from high-level languages 
into unstructured languages are not suitable for our algorithms as they over- 
approximate the set of infeasible executions to retain soundness w.r.t. partial 
correctness proofs. These translations might both over- and under-approximate 
the set of feasible executions of a program and thus violate our notion of sound- 
ness. However, for the purpose of comparing the performance of the different 
algorithms, the experiments are still valid. 

In our experiments we only used SMTinterpol to answer the queries. For 
the comparison of AlgPC with the other algorithms, the choice of the theorem 
prover can make a significant difference. SMTinterpol tries to find a valuation 
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for a formula by making as few changes as possible to the previous valuation. If a 
theorem prover chooses a different strategy, in particular AlgSC might become 
much fast. However, we are not aware of any theorem prover that uses this kind 
of strategy. 

8 Related Work 

Automatic test case generation is a wide field ranging from purely random gen- 
eration of input values (e.g., [22]) to complex static analysis. The presented 
algorithms can best be compared to tools that provide automatic white-box test 
case generation. Probably the most notable tools in this field are PREfix [6] and 
Pex [23]. Both algorithms use symbolic execution to generate test cases that 
provoke a particular behavior. Pex further allows the specification of parameter- 
ized unit tests. Symbolic execution analyzes a program path-by-path and then 
uses constraint solving to identify adequate input to execute this path. In con- 
trast, our approach encodes all paths into one first-order formula and then calls 
a theorem prover to return any path and the input values needed to execute 
this path. In a way, symbolic execution selects a path and then searches feasible 
input values for this path, while our approach just asks the theorem prover for 
any path which is feasible. One advantage of our approach is that it might be 
more efficient to ask the theorem prover for a feasible path than checking for 
each path if it is feasible. 

Many other approaches to static analysis-based automatic test case genera- 
tion and bounded model checking exist but, due to the early stage of the devel- 
opment of the proposed ideas, a detailed comparison is subject to future work. 

In [10] test cases are generated from interactive correctness proofs. The ap- 
proach of using techniques from verification to identify feasible control-flow paths 
for test case generation is similar to ours. However, they generate test cases from 
a correctness proof, which might contain an over-approximation of the feasible 
executions. This can result in non-executable test cases. Our approach under- 
approximates the set of feasible executions and thus, any of the generated test 
cases can be executed. 

Using a first-order formula representation of a program and a theorem prover 
to identify particular paths in that program goes back to, e.g., ESC [11] and, more 
recently, Boogie [19,12,1]. These approaches use similar program transformation 
steps to generate the formula representation of a program. However, the purpose 
of these approaches is to show the absence of failing executions. Therefore, their 
formula represents an approximation of the weakest precondition of the program 
with postcondition true. In contrast, we use the negated wlp with postcondition 
false. Showing the absence of failing executions is a more complicated task and 
requires a user-provided specification of the intended behavior of the program. 

In [14], Grigore et al propose to use the strongest postcondition instead of 
the weakest precondition. This would also be possible for our approach. As men- 
tioned in Section 4, the reachability variables are used to avoid encoding the 
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complete strongest postcondition. However, it would be possible to use sp and 
modify the reachability variables to encode wlp. 

Recently there has been some research on wlp based program analysis: in [17], 
an algorithm to detect unreachable code is presented. This algorithm can be seen 
as a variation of AlgSC . However, it does not return test cases. The algorithms 
AlgFM [16,15], and AlgVSTTE [3] detect code which never occurs on feasible 
executions. While AlgFM detects doomed program points, i.e. control-locations, 
AlgVSTTE detects statements, i.e. edges in the CFG. If a piece of code cannot 
be proved doomed/infcasible, a counter example is obtained which represents 
a normal-terminating executions. The main difference to our approach is that 
their formula is satisfied by all executions that either block or fail. We do not 
consider that an execution might fail and leave this to the execution of the test 
case. 

There are several strategies to cover control-flow graphs. The most related to 
this work is [3], which has already been explained above. Other algorithms such 
as, [5,4,13] present strategies to compute feasible path covers efficiently. These 
algorithms use dynamic analysis and are therefore not complete. 

Lahiri et al [18] used a procedure similar to one of our proposed algorithms 
to efficiently compute predicate abstraction. They used an A11SMT loop over 
a set of important predicates. One of our algorithms, AlgPC, lifts this idea to 
the context of test case generation and path coverage. Our second algorithm, 
AlgSC cannot be used in their context since the authors of this paper need to 
get all satisfying assignments for the set of predicates. In contrast, we are only 
interested in the set of predicates that are satisfied in at least one model of the 
SMT solver. 



9 Conclusion 

We have presented two algorithms to compute test cases that cover those state- 
ments respectively control-flow paths which have feasible executions within a 
certain number of loop unwindings. The algorithms compute a set of test cases 
in a fully automatic way without requiring any user-provided information about 
the environment of the program. The algorithms guarantee that these executions 
also exist in the original program (with loops) . We further have presented a fully 
automatic way to compute procedure summaries, which gives our algorithm the 
potential to scale even to larger programs. 

If no procedure summaries are used, the presented algorithms cover all state- 
ments/paths with feasible executions within the selected number of unwindings. 
That is, besides returning test cases for the feasible statements/paths one ma- 
jor result is that all statements that are not covered cannot be covered by any 
execution and thus are dead code. 

The experiments show that the preliminary implementation already is able 
to outperform existing approaches that perform similar tasks. The experiments 
also show that computing a feasible path cover is almost as efficient as computing 
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a feasible statement cover with the used oracle even for procedures of up to 300 
lines of code. 

Due to the early stage of development there are still some limitations which 
refrain us from reporting a practical use of the proposed algorithms. So far, we 
do not have a proper translation from high level programming language into our 
intermediate format. Current translations into unstructured intermediate verifi- 
cation languages such as Boogie [1] are built to preserve all failing executions of 
a program for the purpose of proving partial correctness. However, these trans- 
lations add feasible executions to the program during translation which breaks 
our notion of soundness. Further, our language does not support assertions. Run- 
time errors are guarded using conditional choice to give the test case generation 
the possibility to generate test cases that provoke runtime errors. A reasonable 
translation which only under- approximates feasible executions is still part of our 
future research. 

Another problem is our oracle. Theorem provers are limited in their ability 
to find satisfying valuations for verification conditions. If the program contains, 
e.g., non-linear arithmetic, a theorem prover will not be able to find a valua- 
tion in every case. This does not affect the soundness of our approach, but it 
will prevent the algorithm from covering all feasible paths (i.e., the approach 
is not complete anymore). To make these algorithms applicable to real world 
programs, a combination with dynamic analysis might be required to identify 
feasible executions for those parts where the code is not available, or where the 
theorem prover is inconclusive. 

Future Work. Our future work encompasses the development of a proper transla- 
tion from Java into our unstructured language. This step is essential to evaluate 
the practical use of the proposed method and to extend its use to other appli- 
cations. 

One problem when analyzing real programs is intellectual property bound- 
aries and the availability of code of third-party libraries. We plan to develop 
a combination of this approach with random testing (e.g., [21]), where random 
testing is used to compute procedure summaries for library procedure(s) where 
we cannot access the code. 

The proposed procedure summaries have to be recomputed if the available 
summaries for a procedure do not represent any feasible execution in the current 
calling context. Therefore, we plan to develop a refinement loop which stores 
summaries more efficiently. 

Another application would be to change the reachability variables in a way 
that they are only true if an assertion inside a block fails rather than if the block 
is reached. This would allow us to identify all paths that violate assertions in 
the loop-free program. Encoding failing assertions this way can be seen as an 
extension of the work of Leino et al in [20] . 

In the theorem prover, further optimizations could be made to improve the 
performance of AlgSC. Implementing a strategy to find new valuations that, 
e.g., change as many reachability variables as possible from the last valuation 
could lead to a much faster computation of a feasible statement cover. In the 
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future, we plan to implement a variation of the algorithm AlgVSTTE [3] inside 
the theorem prover. 

We believe that the presented method can be a powerful extension to dynamic 
program analysis by providing information about which parts of a program can 
be executed within the given unwinding, what valuation is needed to execute 
them, and which parts can never be executed. The major benefit of this kind 
of program analysis is that it is user friendly in a way that it does not require 
any input besides the program and that any output refers to a real execution in 
the program. That is, it can be used without any extra work and without any 
expert knowledge. However, more work is required to find practical evidence for 
the usefulness of the presented ideas. 
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